Cache CLI extractor paths across Actions steps by mario-campos · Pull Request #3950 · github/codeql-action

mario-campos · 2026-06-04T13:53:53Z

Similar to #3943, this PR caches the output of codeql resolve languages, which contains the paths to the various extractors so that repeated calls to resolveLanguages() are idempotent. Additionally, re-implement resolveExtractor() as a wrapper over resolveLanguages() (to re-use the cached output) rather than shell out to codeql resolve extractor.

In one experiment, I counted seven instances of shelling out to codeql resolve extractor. When you dig into the code, you can see why: resolveExtractor() is not called often or from many places; But one caller is isTracedLanguage(), which is wrapped by isScannedLanguage(). And these functions are often used in a loop/map over all/some languages. This can explain why we see consecutive executions of codeql resolve extractor.

In support of the above goals, this PR also adds some additional functions to the json module, to enable validation of the codeql version output.

Risk assessment

For internal use only. Please select the risk level of this change:

Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

Advanced setup - Impacts users who have custom CodeQL workflows.
Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.
Code Quality - The changes impact analyses when analysis-kinds: code-quality.
Other first-party - The changes impact other first-party analyses.
Third-party analyses - The changes affect the upload-sarif action.

Environments:

Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
GHES - Impacts CodeQL workflows on GitHub Enterprise Server.
Testing/None - This change does not impact any CodeQL workflows in production.

How did/will you validate this change?

Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).
Other - Manual/local testing

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

Feature flags - All new or changed code paths can be fully disabled with corresponding feature flags.
Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.
Development/testing only - This change cannot cause any failures in production.
Other - Please provide details.

How will you know if something goes wrong after this change is released?

Telemetry - I rely on existing telemetry or have made changes to the telemetry.
- Dashboards - I will watch relevant dashboards for issues after the release. Consider whether this requires this change to be released at a particular time rather than as part of a regular release.
- Alerts - New or existing monitors will trip if something goes wrong with this change.
Other - Please provide details.

Are there any special considerations for merging or releasing this change?

No special considerations - This change can be merged at any time.
Special considerations - This change should only be merged once certain preconditions are met. Please provide details of those or link to this PR from an internal issue.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Consider adding a changelog entry for this change.
Confirm the readme and docs have been updated if necessary.

henrymercer

Caching these invocations makes a lot of sense! I have a high level comment and a couple of lower level comments.

The main point is that now that we're caching multiple invocations, it might be a good opportunity to generalise the design. For instance, you could imagine something like:

const versionCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_VERSION_INFO, validate: isVersionInfo });
const resolveLanguagesCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_RESOLVE_LANGUAGES, validate: isResolveLanguagesOutput });

where createPersistedCliCache handles memoising in the Action and persisting between Actions steps with an environment variable.

Some smaller things:

Ideally the cache entry would also depend on getExtraOptionsFromEnv(["resolve","languages"])
We should remove the cache in testing-utils.ts like we do for the CodeQL version cache

mbg

I agree with @henrymercer's comments regarding a more generalised design for this. I am wondering about the use of environment variables here vs using a file on disk. I don't know if you have already considered this, but we store e.g. the Action configuration on disk as a file. Perhaps that would make sense for these cached CLI results as well.

A general point: could we also make sure to add doc comments for new top-level definitions before merging?

Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.

By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.

This commit adds a `number` validator`, an `object` validator, an `isNumber` predicate, and `undefinable()` to test optional-but-not-null properties.

This provides a separation of concerns between the memoization and the execution.

mario-campos · 2026-06-18T15:44:33Z

I've taken your comments into consideration and overhauled the design to be more comprehensive and unified. The design now backs to a temporary file instead of the environment. I also identified a few opportunities to refactor some duplicated code into helper functions.

I kept the use of cmd as a key in the cache, but I question whether it's really necessary. I think it's safe to assume that, in most cases, there will only be one instance of codeql in use per job. And, even in the event that there's more than one instance, how likely is it that init would use a different version than autobuild or analyze? If it's not necessary, I would opt to delete it to simplify the code a bit.

Copilot

Warning

Copilot's review of this pull request may be incomplete because some of the changed files are excluded by your Copilot content exclusion settings. See Excluding content from Copilot for details.

Pull request overview

This PR introduces a cross-step cache for selected CodeQL CLI command outputs (notably codeql version and codeql resolve languages) to reduce repeated JVM startups and improve performance across GitHub Actions steps. It also refactors extractor resolution to derive extractor roots from resolve languages (reusing the cached output) and extends the internal JSON validation helpers to support stronger runtime validation of CLI JSON output.

Changes:

Add a new 2-tier command-output cache (in-memory + temp-file) and wire it into codeql.ts for version and resolve languages.
Refactor resolveExtractor() to use resolveLanguages() rather than invoking codeql resolve extractor.
Extend src/json validation helpers (number/object validators and undefinable) and add unit tests; remove now-obsolete util-based version cache.

Show a summary per file

File	Description
src/util.ts	Removes the prior in-process/env-var version cache helpers.
src/util.test.ts	Removes tests for the old version-caching behavior.
src/testing-utils.ts	Updates test setup to reset the new command-output cache between tests.
src/status-report.ts	Switches telemetry version lookup to the new cache + `isVersionInfo` guard.
src/json/index.ts	Adds `number`, `object`, and `undefinable` validators to support schema checks.
src/json/index.test.ts	Adds tests for `undefinable` semantics (rejecting `null`).
src/environment.ts	Removes the env var used for the old persisted version cache.
src/codeql.ts	Adds caching wrappers/type guards and refactors extractor resolution and JSON parsing.
src/cache.ts	New: implements the command-output cache (memo + temp file).
src/cache.test.ts	New: tests cache persistence/memo behavior and validation.
lib/entry-points.js	Generated output (content excluded by policy; not reviewed).

Copilot's findings

Files excluded by content exclusion policy (1)

lib/entry-points.js

Files reviewed: 10/11 changed files
Comments generated: 3

+  // Tier 1: the in-memory variable.
+  const memoized = inMemoryCache.get(key);
+  if (memoized !== undefined) {
+    return memoized.output as T;
+  }


+      return getCachedOrRun(
+        CommandCacheKey.ResolveLanguages,
+        cmd,
+        () =>
+          runCliJson<ResolveLanguagesOutput>(cmd, [


github-actions Bot added the size/S Should be easy to review label Jun 4, 2026

henrymercer reviewed Jun 5, 2026

View reviewed changes

Comment thread src/codeql.ts Outdated

mbg reviewed Jun 10, 2026

View reviewed changes

Comment thread src/environment.ts Outdated

Comment thread src/util.ts Outdated

mario-campos added 8 commits June 18, 2026 09:58

Cache the output of codeql resolve languages

311292c

Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.

Reimplement resolveExtractor() as wrapper over resolveLanguages()

6010f85

By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.

Validate numbers, objects, and undefinables in the json module

445107e

This commit adds a `number` validator`, an `object` validator, an `isNumber` predicate, and `undefinable()` to test optional-but-not-null properties.

Refactor isVersionInfo() to use json` module

587fcb3

Refactor CLI executions into helper functions

889ae42

This provides a separation of concerns between the memoization and the execution.

Refactor CLI JSON handling into a dedicated runCliJson function

dc8e1e9

Refactor CLI caching with in-memory and file storage

a602287

Rebased onto main; fixups were needed

b18df17

mario-campos force-pushed the mario-campos/cache-cli-resolve-langs branch from c218fd6 to b18df17 Compare June 18, 2026 15:25

github-actions Bot added size/XL May be very hard to review and removed size/S Should be easy to review labels Jun 18, 2026

Add error handling for undefined extractors in language resolution

553eef0

mario-campos marked this pull request as ready for review June 18, 2026 15:44

mario-campos requested a review from a team as a code owner June 18, 2026 15:44

Copilot AI review requested due to automatic review settings June 18, 2026 15:44

Copilot started reviewing on behalf of mario-campos June 18, 2026 15:45 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache CLI extractor paths across Actions steps#3950

Cache CLI extractor paths across Actions steps#3950
mario-campos wants to merge 9 commits into
mainfrom
mario-campos/cache-cli-resolve-langs

mario-campos commented Jun 4, 2026 •

edited

Loading

Uh oh!

henrymercer left a comment

Uh oh!

Uh oh!

mbg left a comment

Uh oh!

Uh oh!

Uh oh!

mario-campos commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mario-campos commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Risk assessment

Which use cases does this change impact?

How did/will you validate this change?

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

How will you know if something goes wrong after this change is released?

Are there any special considerations for merging or releasing this change?

Merge / deployment checklist

Uh oh!

henrymercer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mario-campos commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mario-campos commented Jun 4, 2026 •

edited

Loading